To do

  • References as a global summary in the end
  • Present only variables that are actually used in the SEM
  • If there is time at the end: add some manually calculated (very simple) examples and a bit of theory behind graphs, tracing rules and such
  • Look for possible exercises and a way to optimally include them

Dataset

The data for following exercises stem from the publication Grassland ecosystem recovery after soil disturbance depends on nutrient supply rate and are publicly available at Dryad. The data were obtained during the long-term field experiment Cedar Creek LTER and target the effects of human disturbances on grassland ecosystem functioning and biodiversity.

Load data

rm(list = ls())
seabloom <- read.table(here("2_Modeling/Data_preparation/seabloom-2020-ele-dryad-data/cdr-e001-e002-output-data.csv"),
                       sep = ",", header = TRUE)

Explore data

dim(seabloom)
## [1] 5040   16
str(seabloom)
## 'data.frame':    5040 obs. of  16 variables:
##  $ exp       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ field     : chr  "A" "A" "A" "A" ...
##  $ plot      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ disk      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ yr.plowed : int  1968 1968 1968 1968 1968 1968 1968 1968 1968 1968 ...
##  $ ntrt      : int  9 9 9 9 9 9 9 9 9 9 ...
##  $ nadd      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ other.add : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ year      : int  1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 ...
##  $ dur       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ precip.mm : num  879 858 876 994 859 ...
##  $ precip.gs : num  374 551 527 503 512 ...
##  $ mass.above: num  61.3 135.9 216.9 302.8 586.9 ...
##  $ rich      : int  13 15 14 14 16 14 8 7 9 13 ...
##  $ even      : num  0.463 0.156 0.204 0.14 0.269 ...
##  $ ens.pie   : num  6.02 2.34 2.85 1.96 4.31 ...

Exploring the data reveals 16 variables with each 5040 data points:

  • exp: treatments in split-plot design: 1 = disturbance (Control or Disked, 35 × 55 m plots) and 2 = nutrient addition (9 levels, 4 × 4 m plots)
  • field: three experimental fields A, B and C
  • plot: 54 plots within fields
  • disk: disking treatment (0 = intact at start of experiment, 1 = disked at start of experiment)
  • yr.plowed: last year field was plowed for agriculture (A: 1968, B: 1957 and C: 1934)
  • ntrt: nine levels representing different combinations of nitrogen (0 to 27.2 g N year-1 added as NH4NO3) and other nutrients (20 g m−2 year−1 P205; 20 g m−2 year−1 K20; 40 g m−2 year−1 CaCO3; 30.0 g m−2 year−1 MgSO4; 18 μg m−2 year−1 CuSO4; 37.7 μg m−2 year−1 ZnSO4; 15.3 μg m−2 year−1 CoCO2; 322 μg m−2 year−1 MnCl2 and 15.1 μg m−2 year−1 NaMoO4; details see Table S1 in publication). Nutrients were applied twice per year in mid-May and mid-June.
  • nadd: nitrogen additon rate (g/m2/yr)
  • other.add: other nutrient treatment (0 = control, 1 = other nutrients added)
  • year: sampling year
  • dur: duration of experiment
  • precip.mm: annual precipitation (mm)
  • precip.gs: growing season precipitation (mm)
  • mass.above: aboveground biomass (g/m2)
  • rich: species richness (species/0.3 m2)
  • even: Simpson’s evenness
  • ens.pie: effective number of species, (= probability of interspecific encounter decimal equivalent to inverse Simpson’s diversity)
  • origin: species origin (native or introduced)
  • duration: species lifespan (annual, bienniel, perennial)
  • functional.group: species functional group:
    • C3 = C3 grass
    • C4 = C4 grass
    • F = forb
    • L = legume
    • S = sedge

Overview

The pairs function yields an overview over the numerical data.

panel.points <- function(x, y) {
  points(x, y, cex = 0.1)
  abline(lm(y ~ x), lty = 2, col = "red")
  }

pairs(seabloom[, -c(1:3, 5:6, 8:10, 12, 16)],
      lower.panel = NULL, upper.panel = panel.points)

Materials and methods

Disturbance treatment

The disturbance treatment was replicated in the three old-fields (A, B and C) in a completely randomised block design (two treatments in each of three fields for a total of 6 35 × 55 m large plots). In April 1982, in each of the fields, one of these two 35 × 55 m areas was selected to be disturbed with a 45 cm diameter disk harrow pulled by a tractor 20 times in one direction, 20 times perpendicularly and 5 times diagonally to the first passes. Following the disking, the soil was hand raked to smooth the soil and remove any remaining vegetation, so that subsequent colonisation was solely from seeds or small rhizome fragments. Within each of the 6 large plots, the 54 small plots were arrayed in 6 × 9 grid with 1 m buffers between each plot. Aluminium flashing was buried to depth of 30 cm around each plot to prevent horizontal movement of nutrients and spreading of plants through vegetative growth.

Nutrient treatments

The nutrient treatments were replicated six times in a completely randomised design in each of the 35 × 55 m plots (54 4 × 4 m small plots) yielding 324 (6 x 54) plots. The analyses focuses on two nutrient treatments:

  1. Control (no nutrients; Treatment I) and
  2. Other Nutrients and 9.5 g of N (Treatment F)

Sampling and analysis

At peak biomass (mid-July to late August), all aboveground biomass was clipped in a 3 m by 10 cm strip (0.3 m2) in each plot. Note that there were 4 years when the disturbed plots were not sampled or only sampled in a single field. The biomass was sorted into dead, previous year’s growth (litter) and live, current year’s growth (live biomass). Live biomass was sorted to species, dried to constant mass at 40°C, and weighed to the nearest 0.01 g. We estimated total aboveground biomass as the summed biomass of all non-woody species in each 0.3 m2 sample, converted to g m-2. We excluded woody biomass, because our goal was to estimate annual productivity and most of the woody biomass is from previous year’s growth. Woody plant biomass composed less than 1% of total biomass across the data set.

Shorten this: Species richness is the number of species in each 0.3 m2 sample. We quantified plant diversity as the Effective Number of Species based on the Probability of Interspecific Encounter (ENSPIE), a measure of diversity that is more robust to the effects of sampling scale and less sensitive to the presence of rare species than species richness (Jost, 2006, 2007; Chase and Knight, 2013). ENSPIE is equivalent to the Inverse Simpson’s index of diversity which is calculated as \(1 / \sum_{i=1}^{S} p_i^2\) where S is the total number of species (i.e. species richness) and pi is the proportion of the community biomass reesented by species i (Jost, 2006, 2007; Chase and Knight, 2013). Simpson’s evenness (E) satisfies the main requirements of an evenness index (Smith and Wilson, 1996). In addition, it is directly related to ENSPIE through the relationship E = ENSPIE/S (Smith and Wilson, 1996), thus we can factor diversity directly into its richness and evenness components through the relationship ENSPIE = S*E.

Metamodel

A metamodel summarizes the concept behind a model and links it to theory. Here, the metamodel is visualized as a directed acyclic graph (DAG) which reads as: productivity (biomass) is directly influenced by the environment (nutrients, disturbance and precipitation) on the one hand and biodiversity (richness and evenness) on the other hand. The environment influences also biodiversity and thus, also has an indirect effect on productivity via biodiversity. Describe this interaction thingy!

Subset to one year

For simplicity, set the focus on only one year. Then, 240 observations remain.

seabloom <- seabloom[seabloom$year == 2000, ]
dim(seabloom)
## [1] 240  16

Linear models

First, implement the metamodel into a linear model (LM). For this, three models are necessary: one that accounts for the direct- and two for the indirect effects.

  • residuals
  • variance/covariance matrix

Direct effects

lm.dir <- lm(mass.above ~ nadd + precip.mm + rich + even,
             data = seabloom)
summary(lm.dir)
## 
## Call:
## lm(formula = mass.above ~ nadd + precip.mm + rich + even, data = seabloom)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -218.85  -59.09  -15.35   36.31  689.30 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  73.4044    38.5707   1.903 0.058242 .  
## nadd          6.7044     0.8345   8.034  4.5e-14 ***
## precip.mm         NA         NA      NA       NA    
## rich         12.7405     3.6505   3.490 0.000576 ***
## even         36.8867    40.5034   0.911 0.363379    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 100.4 on 236 degrees of freedom
## Multiple R-squared:  0.2153, Adjusted R-squared:  0.2053 
## F-statistic: 21.58 on 3 and 236 DF,  p-value: 2.187e-12

Problem: the estimates for precip.mm are all NAs. Reason is the focus on a single year (i.e. the year 2000) what fixes the value for precipitation.mm to 599.7 mm (i.e. the precipitation in 2000) and as it is impossible to predict anything from a single value, NA is returned.

summary(seabloom$precip.mm)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   599.7   599.7   599.7   599.7   599.7   599.7

Thus, precipitation.mm is excluded from the model from now on.

lm.dir <- lm(mass.above ~ nadd + rich + even, data = seabloom)
summary(lm.dir)
## 
## Call:
## lm(formula = mass.above ~ nadd + rich + even, data = seabloom)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -218.85  -59.09  -15.35   36.31  689.30 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  73.4044    38.5707   1.903 0.058242 .  
## nadd          6.7044     0.8345   8.034  4.5e-14 ***
## rich         12.7405     3.6505   3.490 0.000576 ***
## even         36.8867    40.5034   0.911 0.363379    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 100.4 on 236 degrees of freedom
## Multiple R-squared:  0.2153, Adjusted R-squared:  0.2053 
## F-statistic: 21.58 on 3 and 236 DF,  p-value: 2.187e-12

Indirect effects

To account for the indirect effects, two separate LMs are necessary: one with richness and one with evenness as response.

lm.rich <- lm(rich ~ nadd, data = seabloom)
summary(lm.rich)
## 
## Call:
## lm(formula = rich ~ nadd, data = seabloom)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0318 -1.7746 -0.5349  1.2254  8.9682 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.03177    0.19477  30.969  < 2e-16 ***
## nadd        -0.12856    0.01614  -7.965 6.81e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.216 on 238 degrees of freedom
## Multiple R-squared:  0.2105, Adjusted R-squared:  0.2071 
## F-statistic: 63.44 on 1 and 238 DF,  p-value: 6.81e-14
lm.even <- lm(even ~ nadd, data = seabloom)
summary(lm.even)
## 
## Call:
## lm(formula = even ~ nadd, data = seabloom)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.39021 -0.13808 -0.03255  0.11926  0.50176 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.494342   0.017554  28.161   <2e-16 ***
## nadd        0.003422   0.001455   2.353   0.0195 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1998 on 238 degrees of freedom
## Multiple R-squared:  0.02273,    Adjusted R-squared:  0.01862 
## F-statistic: 5.534 on 1 and 238 DF,  p-value: 0.01946

Conclusion

The direct effect model showed that in the year 2000 biomass was statistically significantly positively related to the nitrogen additon rate (nadd) and species richness (rich). Additionally, nitrogen addition rate had a statistically significantly negative influence on species richness and to a lesser extent, also on evenness.

Structural equation modeling

Structural equations aspire to represent cause-effect relationships and thus, they can be used to represent scientific, causal hypotheses. A key feature of structural equation models is the ability to investigate the networks of connections among system components (Grace et al. 2012).

To evaluate the SEMs, load the package lavaan. This is a powerful piece of software that relies on the computation of covariance matrices to fit the structural equations. It comes with full support for categorical data (any mixture of binary, ordinal and continuous observed variables), can handle both latent and composite variables, performs mediation analysis and calculates the magnitude of indirect effects (Rosseel 2012 + website).

library("lavaan")
## This is lavaan 0.6-8
## lavaan is FREE software! Please report any bugs.

Collinearity

Variables that are highly correlated with \(r > 0.85\) become redundant. Then, one of them can be dropped or they can be modeled summarized as a latent variable (Grace 2006).

Inspecting our set of variables show that correlations between all variables are fairly low.

round(cor(seabloom[, c(4, 7, 13:15)]), digits = 2)
##             disk  nadd mass.above  rich  even
## disk        1.00  0.00       0.05 -0.01  0.02
## nadd        0.00  1.00       0.41 -0.46  0.15
## mass.above  0.05  0.41       1.00  0.00 -0.02
## rich       -0.01 -0.46       0.00  1.00 -0.59
## even        0.02  0.15      -0.02 -0.59  1.00

An example SEM

This toy model shall illustrate the logic of SEM. Comparability is eased as it contains the same variables as the LM before. A huge benefit of SEM is that variables can appear as both predictors and responses what allows the evaluation of direct and indirect effects in one go (as shown in the directed acyclic graph (DAG) below).

In the syntax of lavaan, an arrow in a DAG is represented by a tilde and the model reads as:

simple <-
"mass.above ~ nadd + disk + rich + even
rich ~ nadd
even ~ nadd"

Normal distribution

lavaan uses a \(\chi^2\) test to compare the estimated to the observed covariance matrix to compute a goodness of fit for the SEM under the assumption that all observations are independent and all variables follow a (multivariate) normal distribution (Grace 2006).

Add:

  • QQ-plots to control for normality: check if this makes sense, check for multinormal distribution, ask Jim :)
  • estimators that deal with deviations from the normal distribution

Fit the model

fit.simple <- sem(simple, data = seabloom)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: some observed variances are (at least) a factor 1000 times larger than
## others; use varTable(fit) to investigate

Oups, the algorithm converges with a warning. Kindly, it informs us how to fix this. So, let’s obey the software and run varTable(fit.simple). This reveals an enormous difference in magnitude between the variance of biomass (mass.above) and the other variables:

varTable(fit.simple)
##         name idx nobs    type exo user    mean       var nlev lnam
## 1 mass.above  13  240 numeric   0    0 211.003 12677.909    0     
## 2       rich  14  240 numeric   0    0   4.979     6.196    0     
## 3       even  15  240 numeric   0    0   0.522     0.041    0     
## 4       nadd   7  240 numeric   1    0   8.188    78.895    0     
## 5       disk   4  240 numeric   1    0   0.400     0.241    0

Scale variables

Change scale of only biomass to avoid ambiguity with standardized vs unstandardized variable interpretation

To remove this difference in magnitude between the variables, we scale them by setting their mean to zero and their variance to one (aka the standard- or \(z\)-score). To do so for several columns in a dataframe, call apply() on the columns of interest with the desired function (here, scale). The boxplots show that the variables are now centered around zero:

seabloom[, c(4, 7, 13:15)] <- apply(seabloom[, c(4, 7, 13:15)],  2, scale)
boxplot(seabloom[, c(4, 7, 13:15)])

Then, run the SEM again:

fit.simple <- sem(simple, data = seabloom)
summary(fit.simple, fit.measures = TRUE, rsq = TRUE)
## lavaan 0.6-8 ended normally after 12 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         9
##                                                       
##   Number of observations                           240
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               104.782
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               226.074
##   Degrees of freedom                                 9
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.531
##   Tucker-Lewis Index (TLI)                      -0.407
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -959.487
##   Loglikelihood unrestricted model (H1)       -907.096
##                                                       
##   Akaike (AIC)                                1936.974
##   Bayesian (BIC)                              1968.299
##   Sample-size adjusted Bayesian (BIC)         1939.772
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.376
##   90 Percent confidence interval - lower         0.316
##   90 Percent confidence interval - upper         0.439
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.141
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   mass.above ~                                        
##     nadd              0.529    0.065    8.161    0.000
##     disk              0.054    0.057    0.941    0.346
##     rich              0.281    0.064    4.381    0.000
##     even              0.065    0.058    1.119    0.263
##   rich ~                                              
##     nadd             -0.459    0.057   -7.998    0.000
##   even ~                                              
##     nadd              0.151    0.064    2.362    0.018
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .mass.above        0.779    0.071   10.954    0.000
##    .rich              0.786    0.072   10.954    0.000
##    .even              0.973    0.089   10.954    0.000
## 
## R-Square:
##                    Estimate
##     mass.above        0.233
##     rich              0.210
##     even              0.023

Goodness of fit

This time, the model converged, however, with poor fit:

  • The ratio of the test statistic and the degrees of freedom should be smaller than 2. Here, the ratio is 104.8 / 3 = 34.9, what indicates that the model is quite far away from a decent fit.
  • The \(p\)-value, which represents the probability of the data given our model (or no significant discrepancy between model and data) [JBG], should be larger than 0.05.
  • The comparative fit index (CFI) ranges between zero and one, whereas a value \(> .95\) is considered as a good fit indicator (Hu and Bentler 1999). Our CFI of 0.5 is fairly below this threshold.

Modification indices

To improve the model fit, we look for missing paths via modification indices. They indicate a drop in the model \(\chi^2\) value resulting from freeing fixed parameters via including a missing path. 3.84 is considered as the critical threshold, the “single-degree-of-freedom \(\chi^2\) criterion” [JBG].

modindices(fit.simple, minimum.value = 3.84)
##     lhs op        rhs     mi    epc sepc.lv sepc.all sepc.nox
## 15 rich ~~       even 84.811 -0.520  -0.520   -0.594   -0.594
## 16 rich  ~ mass.above 51.301 -4.922  -4.922   -4.969   -4.969
## 17 rich  ~       even 84.811 -0.534  -0.534   -0.534   -0.534
## 19 even  ~ mass.above 79.659 -2.227  -2.227   -2.248   -2.248
## 20 even  ~       rich 84.811 -0.661  -0.661   -0.661   -0.661

In the column mi (for modification index) we look for high values. Note, however, that the modification indices are often meaningless and further adaptations of the model based on their information needs to be based on theory.

In this example, the modification indices indicate–amongst other–a missing relation between richness and evenness. Including this relation into the model would improve its fit by the change in \(\chi^2\) by 84.811. –> still, 104.782 - 84.811 = 19.971, but the updated model says 0.143

So, let’s include a correlation between rich and even. This path is theoretically justified as richness and evenness can be assumed to have a common driver (Note: in SEM, a correlation between two variables points to an omitted common cause/variable that drives this correlation). With update() it is possible to directly incorporate the missing path into the specified model without rewriting it from scratch:

fit.simple.up <- update(fit.simple, add = "rich ~~ even")
summary(fit.simple.up, rsq = TRUE, fit.measures = TRUE)
## lavaan 0.6-8 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        10
##                                                       
##   Number of observations                           240
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.143
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.931
## 
## Model Test Baseline Model:
## 
##   Test statistic                               226.074
##   Degrees of freedom                                 9
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.039
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)               -907.167
##   Loglikelihood unrestricted model (H1)       -907.096
##                                                       
##   Akaike (AIC)                                1834.334
##   Bayesian (BIC)                              1869.141
##   Sample-size adjusted Bayesian (BIC)         1837.443
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.037
##   P-value RMSEA <= 0.05                          0.961
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.007
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   mass.above ~                                        
##     nadd              0.529    0.065    8.118    0.000
##     disk              0.054    0.057    0.941    0.346
##     rich              0.281    0.080    3.523    0.000
##     even              0.065    0.072    0.900    0.368
##   rich ~                                              
##     nadd             -0.459    0.057   -7.998    0.000
##   even ~                                              
##     nadd              0.151    0.064    2.362    0.018
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .rich ~~                                             
##    .even             -0.520    0.066   -7.916    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .mass.above        0.779    0.071   10.954    0.000
##    .rich              0.786    0.072   10.954    0.000
##    .even              0.973    0.089   10.954    0.000
## 
## R-Square:
##                    Estimate
##     mass.above        0.218
##     rich              0.210
##     even              0.023
# standardizedsolution(fit.simple.up)
# inspect(fit.simple.up, "r2")

modindices(fit.simple.up, minimum.value = 3)
## [1] lhs      op       rhs      mi       epc      sepc.lv  sepc.all sepc.nox
## <0 rows> (or 0-length row.names)

Now, the model has a decent fit with a ratio of test statistic and degrees of freedom much smaller than two (i.e. 0.07) and a \(p\)-value close to 1. Further, the comparative fit index (CFI) is now equal to 1 (i.e. 0.931091).

Another look at the modificatin indices shows that further modifications would yield only smallest improvements to the model fit:

modindices(fit.simple.up, minimum.value = 0)
##     lhs op        rhs    mi    epc sepc.lv sepc.all sepc.nox
## 12 nadd ~~       disk 0.000  0.000   0.000       NA    0.000
## 16 rich  ~ mass.above 0.003  0.046   0.046    0.046    0.046
## 18 rich  ~       disk 0.003  0.002   0.002    0.002    0.002
## 19 even  ~ mass.above 0.111  0.318   0.318    0.318    0.318
## 21 even  ~       disk 0.111  0.017   0.017    0.017    0.017
## 22 nadd  ~ mass.above 0.000  0.000   0.000    0.000    0.000
## 25 nadd  ~       disk 0.000  0.000   0.000    0.000    0.000
## 26 disk  ~ mass.above 0.585 -1.809  -1.809   -1.809   -1.809
## 27 disk  ~       rich 0.020 -0.008  -0.008   -0.008   -0.008
## 28 disk  ~       even 0.133  0.023   0.023    0.023    0.023
## 29 disk  ~       nadd 0.000  0.000   0.000    0.000    0.000

Summary output explained

  • Estimates, the raw unstandardized coefficients
  • Std.err, the standard error
  • Z-value, the analog to \(t\)-values derived from maximum likelihood estimation (ML)
  • P(>|z|), the probability of obtaining a \(z\) of the given value by chance
  • Regressions
  • Variances = explained variance for endogenous variables = estimates of the error variances
  • R-square = the \(1 - error\) variance in standardized terms

[JBG]

Standardized effects

  • significance testing of standardized estimates violates statistical principles as significance tests are based on unstandardized parameters -> use for interpretation only

[JB Grace]

Visualize the results

Issue: lavaanPlot was removed from CRAN :-O

The library lavaanPlot allows to simply and straight-forwardly visualize diagrams from lavaan objects. Since it was removed from CRAN begin of 2021, it needs to be downloaded and installed from the archive.

# install.packages("https://cran.r-project.org/src/contrib/Archive/lavaanPlot/lavaanPlot_0.6.0.tar.gz", 
#                  repos = NULL, type = "source")

Here, we select to display significance levels with asterisks.

library("lavaanPlot")

lavaanPlot(model = fit.simple.up,
           node_options = list(shape = "box", color = "gray",
                               fontname = "Helvetica"),
           edge_options = list(color = "black"),
           coefs = TRUE, covs = TRUE, stars = c("covs", "regress"))

Saturated model

A saturated model includes all possible paths. As a result, there are no degrees of freedom left and it is impossible to estimate the model fit…

  • Saturated models have links between all the variables (more strictly true, we have no model degrees of freedom). Saturated models represent a special class of model because they allow for everything to add up, meaning we can completely recover the observed matrix of covariances when our model is saturated. Unsaturated models have testable implications, however.
  • Under global estimation, our comparison is to a saturated model. The reason is that saturated models permit every covariance to be explained, so our \(F_{ML}\) fit function goes to zero
  • A saturated model has an \(F_{ML} = 0\), and thus, \(\chi^2 = 0\)

[JBG]

satur <-
"mass.above ~ nadd + rich + even + disk
rich ~ nadd + disk
even ~ nadd + disk

rich ~~ even"

fit.satur <- sem(satur, data = seabloom)
summary(fit.satur, rsq = TRUE)
## lavaan 0.6-8 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
##                                                       
##   Number of observations                           240
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   mass.above ~                                        
##     nadd              0.529    0.065    8.118    0.000
##     rich              0.281    0.080    3.523    0.000
##     even              0.065    0.072    0.900    0.368
##     disk              0.054    0.057    0.941    0.347
##   rich ~                                              
##     nadd             -0.459    0.057   -7.999    0.000
##     disk             -0.010    0.057   -0.179    0.858
##   even ~                                              
##     nadd              0.151    0.064    2.363    0.018
##     disk              0.024    0.064    0.374    0.708
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .rich ~~                                             
##    .even             -0.520    0.066   -7.916    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .mass.above        0.779    0.071   10.954    0.000
##    .rich              0.786    0.072   10.954    0.000
##    .even              0.973    0.089   10.954    0.000
## 
## R-Square:
##                    Estimate
##     mass.above        0.218
##     rich              0.211
##     even              0.023

Let’s have another look at the modification indices.

modindices(fit.satur, minimum.value = 3)
## [1] lhs      op       rhs      mi       epc      sepc.lv  sepc.all sepc.nox
## <0 rows> (or 0-length row.names)

The modification indices show that the model is saturated and thus, no further paths can be added to improve the model fit.

Report results

  • Annotated model graph
  • Absolute GOF statistics for final model: \(\chi²\) + \(p\)-value, comparative fit index (CFI) & degrees of freedom (df)
  • Table of raw coefficients and statistics
  • Table of total and indirect effects of interest
  • Computed queries as table or graph (ADD CALCULATION OF INDIRECT EFFECTS TO ADVANCED PART?)

[JBG material]

Model comparison

To evaluate whether the simple or the saturated model perform better globally, we can calculate the Akaike information criterion (AIC). A difference of two is considered as informative…. [Anderson]

## Only placeholder comparison
anova(fit.simple.up, fit.satur)
## Chi-Squared Difference Test
## 
##               Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## fit.satur      0 1838.2 1880.0 0.0000                              
## fit.simple.up  2 1834.3 1869.1 0.1428     0.1428       2     0.9311

References

Grace, James B. 2006. Structural Equation Modeling and Natural Systems. Cambridge University Press. https://doi.org/10.1017/CBO9780511617799.
Grace, James B., Donald R. Schoolmaster Jr., Glenn R. Guntenspergen, Amanda M. Little, Brian R. Mitchell, Kathryn M. Miller, and E. William Schweiger. 2012. “Guidelines for a Graph-Theoretic Implementation of Structural Equation Modeling.” Ecosphere 3 (8): art73. https://doi.org/https://doi.org/10.1890/ES12-00048.1.
Hu, Li‐tze, and Peter M. Bentler. 1999. “Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives.” Structural Equation Modeling: A Multidisciplinary Journal 6 (1): 1–55. https://doi.org/10.1080/10705519909540118.
Rosseel, Yves. 2012. “Lavaan: An r Package for Structural Equation Modeling.” Journal of Statistical Software, Articles 48 (2): 1–36. https://doi.org/10.18637/jss.v048.i02.